AITopics | Nusa Tenggara Islands

Collaborating Authors

Nusa Tenggara Islands

Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia

Martinez, Rolando Gonzales, Cooray, Mariza

arXiv.org Machine LearningMar-6-2025

This study leverages spatial machine learning (SML) to enhance the accuracy of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional PMT methodologies are prone to exclusion and inclusion errors due to their inability to account for spatial dependencies and regional heterogeneity. By integrating spatial contiguity matrices, SML models mitigate these limitations, facilitating a more precise identification and comparison of geographical poverty clusters. Utilizing household survey data from the Social Welfare Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021, this study examines spatial patterns in income distribution and delineates poverty clusters at both provincial and district levels. Empirical findings indicate that the proposed SML approach reduces exclusion errors from 28% to 20% compared to standard machine learning models, underscoring the critical role of spatial analysis in refining machine learning-based poverty targeting. These results highlight the potential of SML to inform the design of more equitable and effective social protection policies, particularly in geographically diverse contexts. Future research can explore the applicability of spatiotemporal models and assess the generalizability of SML approaches across varying socio-economic settings.

exclusion error, inclusion error, spatial machine, (12 more...)

arXiv.org Machine Learning

2503.043

Country:

North America > United States (0.05)
Asia > Indonesia > Nusa Tenggara Islands (0.05)
Asia > Indonesia > Sumatra > Bengkulu > Bengkulu (0.04)
(17 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts

Adilazuarda, Muhammad Farid, Wijanarko, Musa Izzanardi, Susanto, Lucky, Nur'aini, Khumaisa, Wijaya, Derry, Aji, Alham Fikri

arXiv.org Artificial IntelligenceFeb-25-2025

Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia's local scripts, with many achieving near-zero performance.

computational linguistic, local script, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2502.18148

Country:

Asia > Indonesia > Bali (0.05)
Asia > Southeast Asia (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(32 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning-based estimation of cattle weight gain and its influencing factors

Hossain, Muhammad Riaz Hasib, Islam, Rafiqul, McGrath, Shawn R., Islam, Md Zahidul, Lamb, David

arXiv.org Artificial IntelligenceFeb-9-2025

Many cattle farmers still depend on manual methods to measure the live weight gain of cattle at set intervals, which is time consuming, labour intensive, and stressful for both the animals and handlers. A remote and autonomous monitoring system using machine learning (ML) or deep learning (DL) can provide a more efficient and less invasive method and also predictive capabilities for future cattle weight gain (CWG). This system allows continuous monitoring and estimation of individual cattle live weight gain, growth rates and weight fluctuations considering various factors like environmental conditions, genetic predispositions, feed availability, movement patterns and behaviour. Several researchers have explored the efficiency of estimating CWG using ML and DL algorithms. However, estimating CWG suffers from a lack of consistency in its application. Moreover, ML or DL can provide weight gain estimations based on several features that vary in existing research. Additionally, previous studies have encountered various data related challenges when estimating CWG. This paper presents a comprehensive investigation in estimating CWG using advanced ML techniques based on research articles (between 2004 and 2024). This study investigates the current tools, methods, and features used in CWG estimation, as well as their strengths and weaknesses. The findings highlight the significance of using advanced ML approaches in CWG estimation and its critical influence on factors. Furthermore, this study identifies potential research gaps and provides research direction on CWG prediction, which serves as a reference for future research in this area.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.compag.2025.110033

2502.06906

Country:

North America > United States (0.14)
Asia > Indonesia > Bali (0.04)
Oceania > New Zealand (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Nutrition and Weight Loss (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)

Add feedback

LLM for Everyone: Representing the Underrepresented in Large Language Models

Cahyawijaya, Samuel

arXiv.org Artificial IntelligenceSep-20-2024

Natural language processing (NLP) has witnessed a profound impact of large language models (LLMs) that excel in a multitude of tasks. However, the limitation of LLMs in multilingual settings, particularly in underrepresented languages, remains a significant hurdle. This thesis aims to bridge the gap in NLP research and development by focusing on underrepresented languages. A comprehensive evaluation of LLMs is conducted to assess their capabilities in these languages, revealing the challenges of multilingual and multicultural generalization. Addressing the multilingual generalization gap, this thesis proposes data-and-compute-efficient methods to mitigate the disparity in LLM ability in underrepresented languages, allowing better generalization on underrepresented languages without the loss of task generalization ability. The proposed solutions cover cross-lingual continual instruction tuning, retrieval-based cross-lingual in-context learning, and in-context query alignment. Furthermore, a novel method to measure cultural values alignment between LLMs operating in different languages is proposed, ensuring cultural sensitivity and inclusivity. These contributions aim to enhance the multilingual and multicultural alignment of LLMs in underrepresented languages, ultimately advancing the NLP field toward greater equality and inclusiveness.

cross-lingual semantic similarity, gurevych and yusuke miyao, international publishing, (17 more...)

arXiv.org Artificial Intelligence

2409.13897

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Indonesia > Bali (0.04)
Asia > Middle East > Jordan (0.04)
(62 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Government (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces

Koto, Fajri, Mahendra, Rahmad, Aisyah, Nurul, Baldwin, Timothy

arXiv.org Artificial IntelligenceApr-2-2024

Although commonsense reasoning is greatly shaped by cultural and geographical factors, previous studies on language models have predominantly centered on English cultures, potentially resulting in an Anglocentric bias. In this paper, we introduce IndoCulture, aimed at understanding the influence of geographical factors on language model reasoning ability, with a specific emphasis on the diverse cultures found within eleven Indonesian provinces. In contrast to prior works that relied on templates (Yin et al., 2022) and online scrapping (Fung et al., 2024), we created IndoCulture by asking local people to manually develop the context and plausible options based on predefined topics. Evaluations of 23 language models reveal several insights: (1) even the best open-source model struggles with an accuracy of 53.2%, (2) models often provide more accurate predictions for specific provinces, such as Bali and West Java, and (3) the inclusion of location contexts enhances performance, especially in larger models like GPT-4, emphasizing the significance of geographical context in commonsense reasoning.

indoculture, language model, province, (13 more...)

arXiv.org Artificial Intelligence

2404.01854

Country:

Asia > Indonesia > Bali (0.25)
Asia > Indonesia > Java > West Java (0.25)
Asia > Indonesia > Sumatra > West Sumatra (0.05)
(24 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.68)
Education (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages

Lopo, Joanito Agili, Tanone, Radius

arXiv.org Artificial IntelligenceApr-1-2024

In Indonesia, local languages play an integral role in the culture. However, the available Indonesian language resources still fall into the category of limited data in the Natural Language Processing (NLP) field. This is become problematic when build NLP model for these languages. To address this gap, we introduce Bhinneka Korpus, a multilingual parallel corpus featuring five Indonesian local languages. Our goal is to enhance access and utilization of these resources, extending their reach within the country. We explained in a detail the dataset collection process and associated challenges. Additionally, we experimented with translation task using the IBM Model 1 due to data constraints. The result showed that the performance of each language already shows good indications for further development. Challenges such as lexical variation, smoothing effects, and cross-linguistic variability are discussed. We intend to evaluate the corpus using advanced NLP techniques for low-resource languages, paving the way for multilingual translation models.

annotator, indonesia, translation, (16 more...)

arXiv.org Artificial Intelligence

2404.01009

Country:

Asia > Indonesia > East Nusa Tenggara > Kupang (0.07)
Asia > Indonesia > Sulawesi > South Sulawesi > Makassar (0.05)
Asia > Indonesia > Java > Jakarta > Jakarta (0.04)
(24 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.48)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models

Leong, Wei Qi, Ngui, Jian Gang, Susanto, Yosephine, Rengarajan, Hamsawardhini, Sarveswaran, Kengatharaiyer, Tjhi, William Chandra

arXiv.org Artificial IntelligenceSep-18-2023

The rapid development of Large Language Models (LLMs) and the emergence of novel abilities with scale have necessitated the construction of holistic, diverse and challenging benchmarks such as HELM and BIG-bench. However, at the moment, most of these benchmarks focus only on performance in English and evaluations that include Southeast Asian (SEA) languages are few in number. We therefore propose BHASA, a holistic linguistic and cultural evaluation suite for LLMs in SEA languages. It comprises three components: (1) a NLP benchmark covering eight tasks across Natural Language Understanding (NLU), Generation (NLG) and Reasoning (NLR) tasks, (2) LINDSEA, a linguistic diagnostic toolkit that spans the gamut of linguistic phenomena including syntax, semantics and pragmatics, and (3) a cultural diagnostics dataset that probes for both cultural representation and sensitivity. For this preliminary effort, we implement the NLP benchmark only for Indonesian, Vietnamese, Thai and Tamil, and we only include Indonesian and Tamil for LINDSEA and the cultural diagnostics dataset. As GPT-4 is purportedly one of the best-performing multilingual LLMs at the moment, we use it as a yardstick to gauge the capabilities of LLMs in the context of SEA languages. Our initial experiments on GPT-4 with BHASA find it lacking in various aspects of linguistic capabilities, cultural representation and sensitivity in the targeted SEA languages. BHASA is a work in progress and will continue to be improved and expanded in the future.

abstractive summarization, computational linguistics, nlp benchmark component, (14 more...)

arXiv.org Artificial Intelligence

2309.06085

Country:

North America > United States > Montana (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.13)
Asia > Singapore (0.04)
(50 more...)

Genre:

Overview (0.92)
Personal (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Government (1.00)
Education (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, Purwarianti, Ayu

arXiv.org Artificial IntelligenceJul-21-2023

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.

large language model, machine learning, natural language, (24 more...)

arXiv.org Artificial Intelligence

2212.09648

Country:

North America > United States > Texas > Dallas County > Dallas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Timor-Leste (0.14)
(64 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Law (0.67)
Government (0.67)
Information Technology > Services (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

Satellite Monitoring of Terrestrial Plastic Waste

Kruse, Caleb, Boyda, Edward, Chen, Sully, Karra, Krishna, Bou-Nahra, Tristan, Hammer, Dan, Mathis, Jennifer, Maddalene, Taylor, Jambeck, Jenna, Laurier, Fabien

arXiv.org Artificial IntelligenceMar-24-2022

Plastic waste is a significant environmental pollutant that is difficult to monitor. We created a system of neural networks to analyze spectral, spatial, and temporal components of Sentinel-2 satellite data to identify terrestrial aggregations of waste. The system works at continental scale. We evaluated performance in Indonesia and detected 374 waste aggregations, more than double the number of sites found in public databases. The same system deployed across twelve countries in Southeast Asia identifies 996 subsequently confirmed waste sites. For each detected site, we algorithmically monitor waste site footprints through time and cross-reference other datasets to generate physical and social metadata. 19% of detected waste sites are located within 200 m of a waterway. Numerous sites sit directly on riverbanks, with high risk of ocean leakage.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0278997

2204.01485

Country:

Asia > Southeast Asia (0.26)
Asia > Indonesia > Bali (0.05)
Asia > Sri Lanka (0.04)
(20 more...)

Genre: Research Report (0.40)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Architecture (0.93)
(2 more...)

Add feedback

New facial recognition technology caught 'imposter' using someone else's passport, US officials say

The Independent - TechAug-24-2018, 21:58:53 GMT

A new facial recognition technology caught a man trying to enter the US using a passport belonging to someone else, US officials say. Officials with the US Customs and Border Protection (CBP) and the Office of Field Operations (OFO) intercepted a 26-year-old man, the agencies referred to as an "imposter", who reportedly attempted to use a French passport belonging to someone else, at Washington's Dulles International Airport. The man was travelling to the US from Brazil. "The officer utilised CBP's new facial comparison biometric technology which confirmed the man was not a match to the passport he presented," the CBP press release read. It added: "A search revealed the man's authentic Republic of Congo identification card concealed in his shoe."

artificial intelligence, province, soccer coach, (15 more...)

The Independent - Tech

Country:

Asia > South Korea (0.48)
Asia > North Korea (0.28)
South America > Brazil (0.24)
(42 more...)

Genre: Press Release (0.34)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (0.61)

Add feedback